RAND 


The  Design  of  a  Multilevel 
Longitudinal  Survey  of 
Children,  Families,  and 
Communities: 

The  Los  Angeles  Family  and 
Neighborhood  Survey 


Narayan  Sastry 
Bonnie  Ghosh-Dastidar 
John  Adams 
Anne  Pebley 


^Distribution  Unlimited 


Labor  and  Population  Program 

Working  Paper  Series  00-18 


DRU-2400/1-LAFANS 


November  2000 

20010309  065 


The  RAND  unrestricted  draft  series  is  intended  to  transmit 
preliminary  results  of  RAND  research.  Unrestricted  drafts 
have  not  been  formally  reviewed  or  edited.  The  views  and 
conclusions  expressed  are  tentative.  A  draft  should  not  be 
cited  or  quoted  without  permission  of  the  author,  unless  the 
preface  grants  such  permission. 

RAND  is  a  nonprofit  institution  that  helps  improve  polity  and  decisionmaking  through  research  and  analysis. 

RAND's  publications  and  drafts  do  not  necessarily  reflect  the  opinions  or  policies  of  its  research  sponsors . 


THE  DESIGN  OF  A  MULTILEVEL  LONGITUDINAL  SURVEY  OF 
CHILDREN,  FAMILIES,  AND  COMMUNITIES:  THE  LOS  ANGELES 
FAMILY  AND  NEIGHBORHOOD  SURVEYf 


October  2000 


Narayan  Sastry 
Bonnie  Ghosh-Dastidar 
John  Adams 
Anne  Pebley 


RAND 

P.O.Box  2138 
Santa  Monica,  CA  90407 


t  Primary  binding  for  L.A.FANS  comes  from  NICHD  Grant  R01  HD35944.  Additional  funding  has  been 
provided  by  the  Assistant  Secretary  for  Planning  and  Evaluation,  DHHS,  and  by  Los  Angeles  County. 


Abstract.  In  the  last  ten  years,  there  has  been  a  growing  interest  in  the  role  of  neighborhoods  in 
shaping  a  variety  of  outcomes  for  families,  adults,  and  children.  Although  theoretical 
perspectives  are  well  advanced  and  the  basic  statistical  methods  for  modeling  neighborhood 
effects  are  in  place,  a  major  shortcoming  concerns  the  limitations  of  existing  datasets.  Recent 
studies  concerned  with  understanding  children’s  outcomes  have  not  been  designed  with  the 
explicit  goal  of  supporting  multilevel  modeling.  This  makes  it  difficult  to  address  the  most 
important  unresolved  research  issue  in  this  area,  which  is  to  develop  an  understanding  of  the 
causal  effects  of  neighborhoods  factors.  In  this  paper,  we  describe  the  sampling  design  of  the 
Los  Angeles  Family  and  Neighborhood  Study  (L.A.FANS),  a  new  survey  of  children,  families, 
and  neighborhoods  in  Los  Angeles  County.  This  survey  was  designed  explicitly  to  support 
multilevel  studies  on  a  number  of  topics,  including  child  development,  residential  mobility,  and 
welfare  reform.  The  study  is  longitudinal  and  includes  a  baseline  survey  and  several  follow-up 
waves,  which  will  track  previously  interviewed  respondents  and  will  include  a  sample  of  new 
entrants  into  the  sampled  neighborhoods.  We  highlight  the  main  design  and  analytical 
considerations  that  shaped  the  study.  We  also  describe  the  results  of  an  in-depth  statistical 
investigation  of  the  survey’s  ability  to  support  multilevel  analyses  that  were  carried  out  as  part 
of  the  study  design. 


I 


1.  Introduction 

In  the  last  ten  years,  there  has  been  a  growing  interest  in  the  role  of  neighborhoods  in  shaping 
a  variety  of  outcomes  for  families,  adults,  and  children.  The  broad  set  of  outcomes  that  have  been 
studied  includes  violent  crime  (Sampson,  Raudenbush,  and  Earls,  1997),  educational  attainment 
(Gamer  and  Raudenbush,  1991),  domestic  violence  (O’Campo  et  al.,  1995),  fertility  (Billy  and 
Moore,  1992),  residential  mobility  (Lee  et  al.,  1994),  and  children’s  development  (Duncan,  Brooks- 
Gunn,  and  Klebanov,  1994).  Studies  examining  neighborhood  effects  have  considered  not  only  the 
U.S.,  but  also  countries  overseas  (e.g.,  Pebley  et  al.,  1996;  Sampson  and  Groves,  1989;  Sastry, 

1996).  From  a  scientific  and  policy  perspective,  the  potential  importance  of  understanding  the 
effects  of  neighborhoods  is  considerable.  Research  in  this  area  has,  however,  failed  to  produce 
persuasive  and  consistent  results  about  the  nature  and  strength  of  neighborhood  effects,  especially 
regarding  children’s  outcomes  (Jencks  and  Mayer,  1990;  Duncan  and  Raudenbush,  1998; 

Furstenberg  and  Hughes,  1997;  Gephard,  1997).  Although  theoretical  perspectives  are  well 
advanced  and  the  basic  statistical  methods  for  modeling  neighborhood  effects  are  in  place,  a  major 
shortcoming  concerns  the  limitations  of  existing  datasets.  The  Los  Angeles  Family  and 
Neighborhood  Survey  (L.A.FANS)  was  designed  to  overcome  many  of  these  limitations. 

The  most  recent  theoretical  work  regarding  neighborhood  effects  on  children’s  development 
and  well-being  highlights  the  role  of  child-  and  family-related  institutions,  social  organization  and 
interaction,  and  the  normative  environment.  Child-  and  family-related  institutions  include  schools, 
child  care  providers,  recreational  programs  and  activities,  religious  institutions,  and  social  service 
providers.  These  institutions  may  play  an  important  role  in  shaping  children’s  outcomes  by 
providing  access  to  services  and  resources.  Neighborhoods  with  high  levels  of  social  organization 
and  interaction  may  be  effective  in  obtaining  more  institutions  and  better  institutions,  even  if  they 
are  disadvantaged  in  terms  of  income  or  education.  In  addition,  stronger  social  organization  may 
promote  social  ties  and  the  active  support  and  social  control  of  children  (Coleman,  1988;  Sampson 
and  Morenoff,  1997;  Wilson,  1987  and  1996).  Finally,  the  normative  environment,  which  is  shaped 
by  a  concentration  of  people  with  similar  positive  or  negative  outlook  and  behaviors,  may  be  a  key 
element  linking  neighborhood  composition  and  child  outcomes  (e.g..  Crane,  1991). 

Multilevel  models  are  now  widely  used  to  study  the  effects  of  neighborhood  characteristics 
on  child  outcomes.  Compared  to  standard  regression  techniques,  these  models  account  for  the 
correlation  among  observations  measured  at  the  same  level — for  example,  children  in  the  same  family 
or  families  in  the  same  neighborhood — that  remains  after  controlling  for  measured  characteristics. 
Other  statistical  approaches — such  as  generalized  estimating  equations  (Liang  and  Zeger,  1986)  or 
bootstrapping  (Efron  and  Tibshirani,  1993) — also  provide  a  similar  correction.  However,  multilevel 
models  provide  a  straightforward  method  for  estimating  the  strength  of  this  correlation.  This 
information  is  useful  for  understanding  the  importance  of  unmeasured  or  unmeasurable  factors  that 
are  not  included  in  the  model,  but  are  picked  up  by  the  multilevel  model’s  random  effects  because 
they  are  shared  among  children  belonging  to  the  same  family  or  living  in  the  same  neighborhood.1 


'  In  the  context  of  studying  family  and  neighborhood  effects  on  children’s  development,  unobserved  family 
effects  level  may  capture  the  family’s  motivation  for  their  children  to  succeed.  At  the  neighborhood  level,  unmeasured 
factors  may  reflect  how  the  community  values  children’s  learning  achievements. 


1 


Advances  in  statistical  software  over  the  past  few  years  have  now  made  it  straightforward  to 
estimate  multilevel  models. 

The  widespread  use  of  multilevel  models  in  social  science  research  is  supported  by  the 
universal  practice  of  using  multistage,  clustered  sampling  schemes  in  the  design  of  standard 
household  surveys.  These  schemes  are  designed  to  balance  a  trade-off  between  efficiency  of 
fieldwork  operations,  which  argue  for  the  concentration  of  the  sample  in  a  relatively  small  number  of 
compact  areas,  and  the  size  of  design  effects,  which  increase  with  cluster  size  to  reduce  the  effective 
sample  size. 

From  the  perspective  of  designing  a  survey  to  support  the  estimation  of  multilevel  models, 
several  additional  considerations  are  relevant.  These  include  factoring  in  the  added  costs  of 
collecting  community  data  and  having  sufficient  observations  within  each  cluster  to  estimate  certain 
community-level  explanatory  variables.  For  many  potentially  important  neighborhood  variables, 
averaging  individual  responses  is  the  only  way  to  construct  neighborhood  measures ;  larger  samples 
per  cluster  result  in  more  precise  estimates  of  these  measures.  Some  of  these  issues,  but  not  all, 
have  been  addressed  in  a  new  line  of  research  that  builds  on  standard  sample  survey  methods  to  look 
at  additional  issues  raised  by  designing  surveys  for  analysis  using  multilevel  models  (e.g.,  Cohen, 
1998;  Mok,  1995;  Snijders  and  Bosker,  1993). 

With  very  few  exceptions — most  notably,  the  Project  on  Human  Development  in  Chicago 
Neighborhoods — recent  studies  concerned  with  understanding  children’s  outcomes  have  not  been 
designed  with  the  explicit  goal  of  supporting  multilevel  modeling.  Moreover,  few  major  household 
surveys  have  collected  community  data  at  the  same  time;  rather,  community  measures  were  usually 
assembled  after  the  fact,  based  primarily  on  data  collected  in  the  census  but  occasionally  on 
administrative  data  sources.  As  we  explain  below,  this  makes  it  difficult  to  address  the  most 
important  unresolved  research  issue  in  this  area,  which  is  to  develop  an  understanding  of  the  causal 
effects  of  neighborhoods  factors.  Issues  of  causality  are  confounded  by  the  potential  endogeneity  of 
the  availability  and  quality  of  institutions— through  the  targeting  of  programs,  for  example— and  the 
processes  of  residential  selection  and  neighborhood  change. 

In  this  paper,  we  describe  the  sampling  design  of  the  Los  Angeles  Family  and  Neighborhood 
Study  (L.A.FANS),  a  new  survey  of  children,  families,  and  neighborhoods  in  Los  Angeles  County. 
This  survey  was  designed  explicitly  to  support  multilevel  studies  on  a  number  of  topics.  The  study 
is  longitudinal  and  includes  a  baseline  survey  and  several  follow-up  waves.  We  highlight  the  main 
design  and  analytical  considerations  that  shaped  the  study.  We  also  describe  the  results  of  an  in- 
depth  statistical  investigation  of  the  survey’s  ability  to  support  multilevel  analyses  that  were 
carried  out  as  part  of  the  study  design. 

We  begin,  in  the  next  section,  by  describing  the  main  goals  of  the  L.A.FANS.  In  Section  3, 
we  provide  details  on  key  sample  design  issues,  including  the  definition  of  neighborhoods,  sample 
size,  the  number  of  sampled  communities,  stratification,  selection  of  household  survey  respondents, 
and  follow-up  rules.  We  end  the  paper  with  some  brief  conclusions. 

2.  The  Los  Angeles  Family  and  Neighborhood  Survey 

The  Los  Angeles  Family  and  Neighborhood  Survey  is  a  longitudinal  study  of  families  in  Los 
Angeles  County  and  of  the  neighborhoods  in  which  they  live.  The  L.A.FANS  is  specifically 


2 


designed  to  answer  key  research  and  policy  questions  in  three  areas:  the  effects  of  neighborhoods 
and  families  on  children’s  development;  the  effects  of  welfare  reform  at  the  neighborhood  level;  and 
the  process  of  residential  mobility  and  neighborhood  change. 

Neighborhoods  and  families  are  thought  to  have  a  substantial  effect  on  children’s  and  teens’ 
behavior  and  health,  their  attitudes  toward  education  and  work,  their  chances  of  becoming  a  teen-age 
parent,  and  their  educational  and  employment  opportunities.  Yet,  as  discussed  above,  evidence 
about  the  influence  of  families,  neighborhoods,  and  peers  on  these  outcomes  is  limited.  The 
L.A.FANS  will  trace  the  neighborhood  and  family  roots  of  children’s  successes  and  failures  in 
several  areas:  cognitive  development,  school  performance,  behavioral  and  emotional  development, 
health,  youth  violence  and  crime,  drug  and  alcohol  abuse,  and  adolescent  pregnancy. 

The  effects  of  welfare  reform  are  likely  to  vary  greatly  among  neighborhoods  in  Los  Angeles 
County  because  of  differences  in  the  availability  of  employment,  transportation,  day  care,  and 
private  social  service  providers.  The  L.A.FANS  is  designed  to  measure,  over  time,  local-level 
differences  in  the  response  to  and  effects  of  welfare  reform  in  Los  Angeles  County. 

Moving  from  one  town  or  neighborhood  to  another  can  be  an  important  means  of  upward  (or 
downward)  social  mobility.  Residential  mobility  can  also  change  the  character  of  neighborhoods  for 
those  who  live  there,  particularly  in  terms  of  socioeconomic  and  race/ethnic  composition.  While 
general  patterns  of  residential  mobility  are  well  known,  there  is  little  information  about  factors 
behind  the  choices  that  families  make  about  moving  or  staying,  and  where  to  move.  Local-level 
mobility  patterns  of  new  immigrant  families  are  another  important  issue  on  which  there  is  little 
information.  The  L.A.FANS  will  provide  micro-level  data  to  study  residential  mobility, 
neighborhood  selection,  the  processes  leading  to  residential  segregation,  and  migration  patterns  of 
recent  immigrant  families. 

Existing  data  sets  do  not  provide  the  necessary  information  to  study  these  three  main 
research  issues  satisfactorily.  Their  shortcomings  arise  from  having  cross-sectional  designs  and 
inadequate  or  incomplete  measures  of  important  child,  family,  and  neighborhood  characteristics. 
Most  important,  however,  is  that  they  are  plagued  by  selection  effects  that  emerge  over  time 
through  the  process  of  residential  mobility.  Because  families  choose  the  neighborhoods  in  which 
they  live — and  this  choice  may  be  related  to  the  outcome  of  interest,  such  as  children’s  development 
or  well-being — it  is  necessary  to  understand  how  families  select  their  neighborhood  of  residence  in 
order  to  understand  the  effects  of  neighborhood  characteristics  on  outcomes.  Thus,  to  understand 
the  effects  of  neighborhoods,  it  is  necessary  to  also  collect  detailed  data  on  the  process  of  residential 
choice,  as  the  L.A.FANS  does.  Finally,  existing  surveys  that  support  detailed  studies  on  children, 
families,  and  neighborhoods  focus  almost  exclusively  on  cities  on  the  East  Coast  and  in  the 
Midwest.  Los  Angeles  is  very  different  to  these  older  cities  in  a  number  of  key  dimensions,  such  as 
physical  layout,  ethnic  mix,  political  structure,  and  patterns  of  social  interaction  and  daily  life. 
Studies  focusing  on  Los  Angeles  will  provide  insights  into  the  effects  of  neighborhoods  in  newer 
cities  in  the  West  and  Southwest  of  the  U.S.  Furthermore,  given  that  Los  Angeles  leads  the  nation 
in  many  important  trends — for  example,  in  terms  of  suburbanization  and  urban  sprawl — the  data 
from  L.A.FANS  could  be  used  to  understand  one  possible  future  for  cities  around  the  country. 

Fieldwork  for  the  baseline  wave  of  L.A.FANS  began  in  April  2000  and  will  be  completed  by 
the  spring  of  2001 .  A  second  wave  will  be  fielded  during  roughly  the  same  months  in  2002.  A  third 


3 


wave  is  tentatively  planned  for  2004.  Note  that  a  public  use  version  of  the  data  will  be  prepared 
and  released  a  few  months  after  the  completion  of  each  wave.2  Each  wave  of  data  will  include  the 
collection  or  assembly  of  three  interrelated  data  sets:  (1)  a  household  survey,  (2)  a  neighborhood 
survey,  and  (3)  a  file  of  neighborhood  services  and  characteristics  (NSC)  based  on  administrative  and 
other  records.  Our  focus  in  this  paper  is  to  describe  the  sampling  plan  for  the  household  survey. 

3.  L.A.FANS  Sample  Design 

The  L.A.FANS  was  designed  as  a  multilevel  survey,  first  sampling  neighborhoods,  then 
selecting  families  within  these  neighborhoods,  and  finally  sampling  children  within  these  families. 

As  discussed  below  in  greater  detail,  the  multistage,  clustered  sampling  scheme  has  several  strengths, 
as  well  as  certain  limitations.  Two  strengths  are  worth  mentioning  here.  First,  it  provides  an 
efficient  and  cost-effective  method  for  collecting  detailed  information  about  households  and 
neighborhoods  because  the  sample  is  concentrated  in  a  relatively  small  number  of  locations.  Second, 
family  and  neighborhood  clustering  provide  researchers  with  the  opportunity  to  control  for 
unmeasured  or  unmeasurable  factors  at  the  family  and  neighborhood  levels  using,  for  example,  fixed 
effects  or  random  effects  models. 

In  this  section,  we  discuss:  the  study  site;  the  definition  of  neighborhoods  that  was  used  for 
sampling  purposes;  the  sample  size;  the  number  of  neighborhoods  in  the  sample;  stratification;  the 
sampling  of  tracts,  blocks,  and  households;  the  selection  of  household  respondents;  and  the 
longitudinal  design  of  the  study. 

Study  Site 

L.A.FANS  focuses  on  the  County  of  Los  Angeles,  California.  Los  Angeles  County  is  the 
largest  county  in  the  United  States,  with  an  estimated  population  in  1997  of  8.8  million.  It  is  also 
tremendously  diverse  in  terms  of  race  and  ethnic  composition.  In  1997  the  population  was  41 
percent  Latino,  34  percent  white,  13  percent  Asian,  and  10  percent  black.  Southern  California  is  a 
major  destination  for  immigrants  to  the  U.S.  According  to  the  1990  Census,  38  percent  of  adults  in 
Los  Angeles  County  were  foreign  bom,  as  were  17  percent  of  children. 

Defining  Neighborhoods 

As  in  many  other  urban  areas,  neighborhood  boundaries  are  not  clearly  defined  in  Los 
Angeles  County.  This  has  important  implications  for  both  the  sample  design  and  the  subsequent 
analyses.  For  sampling  purposes,  neighborhood  units  must  be  well-defined  local  geographic  areas 
for  which  up-to-date  population  and  poverty  estimates  can  be  produced.  In  Los  Angeles  County, 
these  units  include:  cities,  zip  codes,  elementary  school  attendance  areas  (ESAAs),  and  census 
tracts,  block  groups,  and  blocks.3  After  examining  maps,  visiting  several  areas  of  the  county,  and 
consulting  Los  Angeles  experts,  we  concluded  that  census  tracts  and  ESAAs  most  closely 
approximate  social  definitions  of  neighborhoods,  because  they  are  of  moderate  size  (an  average  of 


2  Information  on  the  progress  of  the  fieldwork  and  on  the  public  use  data  can  be  obtained  from  the  project 
website  at  www.lasurvey.rand.org.  The  website  will  also  provide  documentation  for  the  data  and  results  of  analyses  by 
RAND  and  other  researchers. 

3  The  boundaries  of  traditional  “neighborhoods”  such  as  Rancho  Park  or  Pico  Union  in  the  City  of  Los 
Angeles  are  generally  not  well  defined  and  not  coded  in  administrative  data  needed  to  produce  population  and  poverty 
estimates. 


4 


* 


8,000  inhabitants  per  ESAA  and  5,600  per  census  tract)  and  are  defined  based  on  social  ecological 
criteria  and  are  generally  compact  and  not  crossed  by  major  geographic  boundaries  (e.g.,  freeways, 
major  boulevards,  and  parks).  We  decided  to  use  census  tracts  rather  than  ESAAs  as  the  sampling 
unit,  because  tracts  generally  include  children  attending  two  or  more  elementary  schools.  Thus,  the 
use  of  census  tracts  as  sampling  units  will  provide  researchers  greater  scope  for  examining  both 
neighborhood  and  school  effects  on  children’s  development. 

With  information  from  the  L.A.FANS  household  survey,  researchers  can  examine  variation 
in  definitions  of  neighborhoods  and  identify  the  places  where  people  live,  work,  shop,  and  attend 
school.  With  tract-level  data  being  assembled  for  all  of  Los  Angeles  County  as  part  of  this  survey 
project,  researchers  can  choose  whether  to  use  census  tracts  or  consider  larger  “neighborhoods”  in 
their  analyses  by  combining  data  for  adjacent  census  tracts.4 

Sample  Size 

Drawing  on  information  from  the  design  of  earlier  multipurpose  household  surveys  and 
imposing  a  constraint  set  by  our  budget,  we  initially  established  a  sufficient  sample  size  for  the 
L.A.FANS  to  be  approximately  3,250  households.  We  also  performed  a  parallel  set  of  sample  size 
calculations  to  verify  this  estimate.  The  parallel  calculations  were  based  on  a  generic  test  of 
proportions  between  two  comparison  groups,  the  equivalent  to  a  logistic  regression  with  a  single 
explanatory  variable.  This  represents  a  standard  and  routinely  used  approach  to  calculating  the 
sample  size  for  a  survey.  Note  that  this  approach  is  not  based  on  a  particular  hypothesis  tied  to  a 
specific  variable,  since  there  are  a  large  number  of  different  areas  that  the  L.A.FANS  was  designed  to 
address.  Rather,  it  represents  a  generic  approach  that  is  both  hypothetical  and  fairly  conservative. 

Because  L.A.FANS  is  based  on  a  stratified  design  (as  explained  below)  this  test  can  be 
thought  of  as  a  between-strata  comparison  of  proportions.  We  assumed  a  baseline  proportion  of 
0.25  in  the  reference  group  and  a  minimum  detectable  difference  of  0. 1 ,  or  40  percent,  between  the 
baseline  group  and  a  comparison  group.  This  means  that  we  would  like  to  be  able  to  detect,  with 
sufficient  statistical  precision,  the  difference  between  the  baseline  group  and  a  group  with  a 
proportion  of  0.35  (=  0.25  +  0.1).  Based  on  standard  power  calculations,  we  determined  that  a 
sample  size  of  325  per  group  was  required  to  detect  such  a  difference  with  a  type-1  error  rate  of 
0.05  and  power  of  0.8.  Note  that  depending  on  the  analysis  in  question,  this  may  refer  to  a  sample 
of  325  adults,  children,  or  individuals  belonging  to  a  particular  sub-group;  thus,  the  power  associated 
with  any  particular  analysis  depends  on  the  specific  sample  being  considered. 

If  we  were  to  draw  a  simple  random  sample  (SRS)  from  the  population  of  Los  Angeles 
County,  we  would  require  325  people  per  group.  Practical  considerations  relating  to  cost  and 
logistics — and  the  nature  of  the  study — led  us  to  consider  a  multilevel,  clustered  design.  This  type 
of  sample  results  in  a  design  effect  that  modifies  the  actual  sample  size,  to  yield  the  effective  sample 
size,  in  the  following  way: 

Effective  sample  size  =  Actual  sample  size/Design  effect. 

Clustering  produces  a  design  effect  greater  than  1 ,  so  that  the  actual  sample  size  is  reduced 
proportional  to  the  design  effect.  However,  we  need  an  effective  sample  size  of  325,  which 

4  Given  the  relatively  small  geographic  and  population  size  of  census  tracts,  most  alternative  neighborhood 
definitions  are  likely  to  be  larger  than  census  tracts. 


5 


t 


indicates  that  we  need  an  actual  sample  size  that  is  larger  than  this.  Note  that  the  design  effect  is  an 
increasing  function  of  the  intra-cluster  correlation  (ICC)  and  the  cluster  size: 

Design  Effect  =  1  +  Intra-cluster  correlation  %  (Cluster  size  - 1). 

Also,  the  inclusion  of  control  variables  in  regression  models  will  generally  account  for  at  least  part  of 
the  correlation  among  observations  in  the  same  group,  and  hence  lead  to  smaller  design  effects  and 
more  powerful  tests  than  in  simpler  comparisons. 

The  intra-cluster  correlation  is  determined  by  the  correlation  in  behavior  and  outcomes 
among  individuals  living  in  the  same  community  and  belonging  to  the  same  household.  Drawing  on  a 
priori  knowledge  of  correlations  associated  with  multistage  designs  involving  communities  and 
households,  we  would  expect  intra-community  correlation  to  range  from  0.01  to  0.05,  while  the 
intra-household  correlation  may  be  substantially  larger  (0.2  to  0.5).  However,  our  sample  design 
and  the  types  analyses  that  will  use  these  data  suggest  that  the  design  effect  will  typically  be  due  to 
intra-community  correlation  alone.  Based  on  these  estimates — and  given  a  moderate  cluster  size  of, 
say,  50  households  per  community— we  calculate  the  design  effects  for  our  sample  to  range  from 
1 .45  to  3.45.  (Note  that  we  conducted  a  detailed  simulation  study,  described  below,  to  evaluate  the 
sensitivity  of  the  sample  design  to  cluster  size.)  The  effective  sample  size  calculations  and  design 
effects  together  suggest  that  a  conservative  actual  sample  size  for  L.A.FANS  should  be  about  1,100 
households  per  stratum  (325  x  3.45  =  1 125).  Given  that  the  L.A.FANS  design  calls  for  three  strata 
(see  below),  the  total  sample  size  of  3,250  appears  to  be  reasonable.  Using  our  specific  sampling 
rules  (described  below)  and  characteristics  of  households  in  Los  Angeles  County  drawn  from  recent 
waves  of  the  Current  Population  Survey,  we  estimate  that  this  will  yield  a  total  of  approximately 
3,250  primary  adult  respondents  and  3,624  sampled  children. 

Finally  we  performed  an  additional  pair  of  power  calculations  to  determine  whether  we 
would  have  a  sufficient  sample  size  to  undertake  a  between-strata  comparison  and  a  regression 
analysis  using  only  the  sampled  children  in  L.A.FANS.  This  was  useful  because  some  of  the  most 
important  L.A.FANS  analyses  will  relate  to  children’s  outcomes  in  the  areas  of  health,  education, 
development,  and  well-being.  The  power  calculations  assumed  a  sample  of  50  households  from  each 
neighborhood  and  that  the  sample  was  divided  between  poor  neighborhoods  (60  percent)  and  non¬ 
poor  neighborhoods  (40  percent).  We  accounted  for  the  fact  that,  by  design,  70  percent  of  the 
sampled  households  have  children  and  that  many  of  these  households  contribute  two  children  to  the 
sample.  The  between-strata  comparison  looked  at  a  measure  of  child  cognitive  development, 
namely  the  score  on  the  reverse  digit  span  (RDS)  test.  The  power  calculations  were  based  on 
estimates  of  sample  means  and  standard  errors  of  the  RDS  from  existing  studies.  The  design  effect 
associated  with  RDS  was  determined  to  be  2.49,  and  mean  RDS  and  associated  standard  deviation 
were  estimated  to  be  28.7  and  26.9,  respectively.  For  a  hypothesis  test  comparing  the  RDS  scores 
for  children  living  in  poor  and  non-poor  neighborhoods,  the  power  associated  with  both  a  one-sided 
test  and  a  two-sided  test  were  over  99  percent.  We  also  considered  a  fixed-effects  analysis  of  the 
relationship  between  birthweight  and  the  RDS  score.  The  power  associated  with  a  test  of 
significance  of  the  regression  coefficient  was  well  over  80  percent.  From  these  calculations  we 
concluded  that  a  sample  size  of  3,250  households  was  sufficient  to  meet  the  central  goals  of  this 
study. 


6 


Cluster  Size/Number  of  Clusters 


Given  a  sample  size  of  3,250  households,  the  next  key  decision  was  the  number  of  clusters 
to  select  for  the  sample.  With  a  fixed  sample  size,  a  greater  number  of  clusters  implies  a  smaller 
number  of  sampled  households  per  cluster.  However,  choosing  a  large  cluster  size — and,  hence,  a 
small  number  of  clusters — reduces  survey  expenses  because  of  savings  in  the  cost  of  listing 
operations,  locating  respondents,  supervising  field  operations,  and  making  repeat  visits  to  complete 
an  interview  or  collect  additional  information  from  respondents.  But  this  also  works  to  increase  the 
variance  of  estimates,  due  to  correlation  among  units  in  the  same  cluster.  The  trade-off  between  the 
number  of  clusters  and  increased  variance — represented  here  using  the  effective  sample  size — is 
illustrated  in  Figure  1 .  This  figure  shows  power  curves  for  four  different  values  of  the  intra-cluster 
correlation  that  all  keep  the  sample  size  fixed  at  3,250.  As  the  number  of  cluster  increases  along  the 
x-axis,  the  effective  sample  size,  shown  on  they-axis,  increases.  However,  the  slopes — representing 
the  gains  in  sample  size — are  quite  different  for  the  four  curves;  in  particular,  the  gains  in  effective 
sample  size  as  the  number  of  clusters  increases  are  greatest  when  intra-cluster  correlation  is  weaker. 

Figure  1.  Effective  sample  size  by  number  of  clusters,  holding  actual  sample  size  at  3,250 


ICC=.01 
*-  ICC=. 05 
*-ICC=.1 
■*—  ICC=.2 


As  discussed  above,  our  a  priori  estimates  of  intra-community  correlation  range  from  0.01  to 
0.05.  Thus  our  initial  choice  of  dividing  the  sample  of  3,250  households  across  65  communities 
(equivalent  to  50  households  per  cluster)  should  yield  an  effective  sample  size  of  between  1,000  and 
2,200  households.  This  is  likely  to  be  sufficient  for  the  generic  types  of  analyses  that  the 
L.A.FANS  is  designed  to  address,  such  as  those  illustrated  in  our  power  calculations.  Nevertheless, 
we  were  concerned  that  our  simple  power  calculations  may  not  be  adequate.  This  is  because  most 
analyses  based  on  the  L.A.FANS  data  will  employ  multilevel  models,  which  are  designed  to  exploit 
the  within-cluster  and  between-cluster  structure  of  the  sample.  We  were  also  uncertain  of  the  degree 
of  clustering  and  the  distributions  of  key  measures  of  interest  in  Los  Angeles  County  census  tracts. 
We  therefore  undertook  a  simulation  study  to  examine  in  greater  detail,  and  with  more  confidence, 
the  trade-offs  associated  with  changing  cluster  size  while  holding  sample  size  constant.  The  goal 
was  to  determine  whether  a  change  in  the  number  of  clusters  would  improve  the  statistical 
properties  of  our  sample  while  keeping  costs  roughly  constant. 


7 


The  simulations  were  conducted  using  data  from  the  1990  Census  STF3A  tables  for  Los 
Angeles  County.  These  tables  allowed  us  to  simulate  many  of  the  key  details  of  our  sampling  plan. 
In  particular,  using  a  stratified  two-level  design,  we  first  sampled  census  tracts  and  then  sampled 
block  groups  within  the  selected  tracts.  Although  we  could  not  simulate  the  sampling  of 
households,  we  were  able  to  select  individuals  and,  hence,  to  estimate  individual-level  regression 
models.  We  chose  unemployment  status  as  the  dependent  variable,  considering  only  adults  in  the 
civilian  labor  force;  we  omitted  those  in  the  armed  forces  (0.3  percent)  or  out  of  the  labor  force  (32.8 
percent).  We  ran  a  logistic  regression  model  that  corrected  for  clustering  using  the  Huber- White 
approach,  with  unemployment  status  as  the  outcome  and  the  following  variables  as  explanatory 
factors:  race  (white/ Asian  versus  black  versus  other),  sex,  percent  Hispanic  in  the  tract,  percent 
receiving  Welfare  Assistance  in  the  tract,  and  the  block-group  unemployment  rate.  For  the 
simulations,  we  considered  several  competing  designs  that  held  the  sample  size  fixed  at  3,250 
households  but  differed  according  to  the  number  of  sampled  census  tracts  (and,  consequently,  the 
number  of  households  per  tracts).  For  each  design  we  simulated  200  samples  and  ran  a  logistic 
regression  model  for  each  sample;  finally,  we  computed  the  variability  of  the  regression  coefficients 
and  standard  errors  across  the  200  analyses.  We  compared  the  results  for  the  four  designs  with  51, 
66,  75,  and  81  total  tracts.  Table  1  presents  the  main  results,  which  are  the  standard  errors  for  a 
representative  group  of  regression  coefficients,  for  each  of  the  four  different  sample  designs. 


Table  1.  Standard  Errors  for  Logistic  Regression  Coefficients  from  a  Simulation  Analysis 

Based  on  a  Model  of  Unemployment  Status 


Number  of 
census  tracts 

Coefficient  (Level  of  measurement) 

Sex  is  female 
(Individual) 

Race  is  black 
(Individual) 

Percent  Hispanic 
(Tract) 

Unemployment  rate 
(Block  group) 

51 

0.140 

0.191 

0.196 

1.81 

66 

0.136 

0.175 

0.188 

1.75 

75 

0.134 

0.173 

0.189 

1.70 

81 

0.130 

0.170 

0.190 

1.73 

Note:  Tracts  were  allocated  equally  across  three  strata.  See  text  for  a  description  of  the  simulations. 


The  results  in  the  table  show  that  there  was  a  substantial  decline  in  the  standard  errors  of  the 
estimated  parameters  when  the  number  of  sampled  clusters  increased  from  5 1  to  66.  Note  that 
smaller  standard  errors  indicate  that  the  regression  parameters  were  estimated  with  greater  precision 
and  hence  are  preferable.  In  contrast,  the  declines  were  much  smaller  when  the  number  of  tracts  was 
increased  to  75  or  81 .  We  also  examined  the  distributions  of  the  estimated  coefficients— using 
boxplots,  for  instance— under  the  four  different  designs  to  see  the  extent  to  which  extreme  values  of 
the  estimated  parameters  emerged.  We  wanted  to  choose  a  scheme  that  offered  some  protection 
against  the  possibility  of  having  a  realization  that  represented  an  outlying  case.  The  results 
indicated  that  there  was  a  very  low  likelihood  of  ending  up  with  such  an  adverse  outcome  with  66  or 
more  tracts  but  this  possibility  was  somewhat  higher  with  5 1  tracts. 

Several  additional  issues  may  influence  the  choice  of  the  number  of  clusters  in  the  sample 
and  average  cluster  size.  These  mainly  concern  the  fixed  fieldwork  costs  per  cluster.  To  this  point, 
we  have  discussed  cost  issues  only  as  they  have  influenced  overall  sample  size.  However,  holding 
overall  sample  size  constant,  an  increase  in  the  number  of  clusters  will  raise  costs  because  there  are 


8 


certain  fixed  costs  for  each  cluster  in  the  sample.  These  fixed  costs  cover  various  aspects  of 
fieldwork  operations  and  the  assembly  or  calculation  of  neighborhood-level  measures — which  are 
important  for  studying  the  main  research  questions  that  the  L.A.FANS  is  designed  to  address.  The 
costs  for  neighborhood-level  data  may  be  quite  small  if  they  are  largely  available  through  external 
sources,  such  as  administrative  records  or  the  decennial  census.  The  costs  may  be  higher  if 
community  data  are  collected  through,  for  example,  a  neighborhood  key  informant  survey.  The 
L.A.FANS  includes  such  a  survey,  but  the  marginal  costs  are  low  because  it  will  be  conducted  by 
telephone.  Finally,  for  neighborhood-level  measures  that  are  estimated  from  the  household 
interview  data,  there  are  two  competing  concerns.  A  larger  number  of  households  per  cluster  will 
improve  the  precision  of  the  cluster-level  estimates;  on  the  other  hand,  a  certain  number  of  clusters 
is  required  in  order  to  capture  cross-cluster  variability.  However,  since  the  minimum  number  of 
cluster  needed  to  capture  cross-cluster  variability  is  likely  to  be  substantially  smaller  than  the  range 
that  we  have  be  examining  in  the  simulations,  this  latter  consideration  is  unlikely  to  be  important. 
Thus,  cost  and  design  considerations  all  argue  for  choosing  the  smallest  number  of  clusters  that  meet 
the  other  needs  reflected  in  the  simulations.  Based  on  the  analysis  described  above,  we  decided  to 
select  a  total  of  65  tracts  for  the  L.A.FANS  sample  from  among  the  1,624  eligible  tracts  in  Los 
Angeles  County.5 

Stratification 

The  L.A.FANS  has  a  stratified  sampling  design.  Stratification  was  adopted  in  order  to 
obtain  an  oversample  of  poor  and  very  poor  tracts  which,  in  turn,  provides  us  with  a  relatively  large 
number  of  respondents  in  poor  households  and  of  welfare  recipients.  A  key  feature  of  the 
L.A.FANS,  however,  is  that  it  includes  a  sample  of  neighborhoods  across  the  entire  income  range. 
This  is  important  because  neighborhoods  are  unlikely  to  exert  an  influence  just  for  children  growing 
up  in  poor  areas;  rather,  neighborhoods  may  also  affect  children  growing  up  in  middle-class  or 
affluent  areas.  Nevertheless,  the  poorest  neighborhoods  are  of  particular  scientific  and  policy 
interest  and  it  is  important  to  be  able  to  conduct  strata-specific  analyses  for  the  poorest 
neighborhoods  and  compare  findings  to  results  for  other  strata. 

In  order  to  identify  current  or  former  welfare  recipients,  oversampling  poor  and  very  poor 
areas  was  determined  to  be  an  easier  and  more  cost-effective  approach  than  screening  households.  A 
screening  operation  would  have  required  interviewers  to  ask  questions  about  welfare  recipiency 
soon  after  contacting  a  potential  respondent.  This  would  have  been  both  time  consuming  and 
expensive,  as  well  as  awkward  for  the  interviewer  and  the  respondent.  Stratification  also  reduces 
the  variance  of  many  estimates  based  on  the  full  sample.  This  is  because  respondents  are  more 
similar  to  others  within  the  same  stratum,  according  to  several  important  measures  such  as  rates  of 
welfare  participation,  but  are  quite  different  to  respondents  in  other  strata.  It  is  straightforward  to 
show  that  under  these  conditions,  stratification  will  lead  yield  more  precise  estimates  of  population 
parameters  (see  Kish,  1965). 


5  Of  the  1,652  tracts  in  Los  Angeles  County,  a  total  of  28  tracts  were  removed  from  the  sampling  frame  (see 
Appendix  A  for  a  listing  of  these  tracts).  There  were  three  reasons  for  dropping  tracts.  First,  13  tracts  were  deemed 
ineligible  because  they  had  a  high  percentage  (over  80%)  of  persons  living  in  group  quarters,  based  on  1990  Census 
data.  Second,  1 1  tracts  representing  ships-at-sea  were  excluded.  Finally,  we  dropped  4  tracts  for  which  essential  data 
was  missing.  Thus,  the  sampling  frame  consists  of  1,624  tracts. 


9 


Prior  to  sampling  census  tracts,  they  were  divided  into  three  strata  based  on  the  percent  of 
the  tract’s  population  in  poverty.  Tract- level  estimates  of  percent  in  poverty  in  1997  were 
developed  by  Los  Angeles  County’s  Urban  Research  Division  (URD)  using  state  and  county 
administrative  data.  The  sampling  strata  in  the  L.A.FANS  design  correspond  to  tracts  that  are  very 
poor  (those  in  the  top  10  percent  of  the  poverty  distribution),  poor  (tracts  in  the  60-89th 
percentiles),  and  non-poor  (tracts  in  the  bottom  60  percent  of  the  distribution).  The  choice  of  three 
strata  and  the  specific  cut-offs  were  based  on  an  analysis  that  examined  the  trade-off  under  different 
schemes  between  likely  yield  of  welfare  recipients  and  the  concentration  of  the  sample  in  a  small 
number  of  high  poverty  areas.  The  chosen  scheme  represented  the  best  balance  between  these  two 
competing  objectives. 

A  key  decision  regarding  the  stratification  scheme  concerns  the  allocation  of  the  sample  of 
tracts  across  strata.  There  are  several  considerations  in  determining  this  allocation.  F or  example, 
holding  the  number  of  respondents  per  tract  constant,  different  schemes  will  provide  varying  yields 
of  poor  families,  welfare  recipients,  and  respondents  of  different  race  and  ethnic  groups.  To  decide 
on  the  allocation  of  tracts  across  strata,  we  undertook  an  analysis  of  alternative  schemes  as  part  of 
the  detailed  simulation  exercise. 

We  considered  several  alternative  schemes  for  allocating  tracts  across  the  three  strata  (see 
Table  2).  Scheme  A  allocates  an  approximately  equal  number  of  census  tracts  to  the  three  strata.  It 
selects  20  tracts  from  both  the  very  poor  and  poor  stratum  and  25  from  the  non-poor  stratum.  We 
allocated  a  slightly  larger  share  to  the  non-poor  stratum  under  this  scheme,  because  this  stratum 
covered  a  large  share  of  the  population — about  56  percent.  This  scheme  which  provides  a 
disproportionately  large  sample  from  the  smaller  strata  (poor  and  very  poor) — will  also  yield  a  large 
sample  of  families  and  children  in  the  poorest  neighborhoods,  which  are  of  particular  research  and 
policy  interest.  A  relatively  even  allocation  of  the  sample  across  the  three  strata  will  provide 
sample-based  estimates  with  similar  levels  of  precision  across  each  stratum,  which  improves  the 
efficiency  (i.e.,  reduces  the  variance)  of  cross-strata  comparisons.  Scheme  B,  which  allocates  the 
sample  of  tracts  in  proportion  to  stratum  size,  has  the  attraction  of  leading  to  estimates  with 
smallest  variances.  However,  given  our  balanced  sample  design — i.e.,  we  plan  to  interview  the  same 
number  of  households  per  tract — this  scheme  would  lead  to  a  relatively  small  number  of 
respondents  in  poor  and  very  poor  tracts.  This  is  undesirable  given  the  research  goals  of  this  study. 
Schemes  C  and  D  oversample  very  poor  and  poor  tracts  in  ratios  of  2:2:1  and  3:2:1  compared  to 
non-poor  tracts.  These  two  schemes  provide  a  substantially  larger  sample  of  poor  households  and 
households  receiving  welfare.  However,  a  disadvantage  is  that  poorer  tracts — especially  very  poor 
tracts — are  greatly  overrepresented  in  the  entire  sample.  This  also  reduces  the  efficiency  of  cross¬ 
stratum  comparisons. 


Table  2.  Schemes  for  Allocating  65  Sampled  Tracts  Across  Strata 


Scheme 

Description 

Stratum 

Veiy  poor 

Poor 

Not  poor 

A 

Roughly  equal  tracts  per  stratum 

20 

20 

25 

B 

Allocate  tract  proportional  to  stratum  size 

7 

22 

36 

C 

2:2:1  allocation 

26 

26 

13 

D 

3:2:1  allocation 

32 

22 

11 

10 


-* 


To  decide  between  these  alternatives,  we  examined  the  trade-off  in  efficiency  across  these 
different  schemes  as  part  of  our  simulation  analysis.  We  also  compared  the  yield  of  minorities  and 
welfare  recipients.  The  number  of  welfare  recipients  in  2000  was  estimated  to  be  80  percent  of  the 
1997  counts,  to  account  for  the  decline  in  program  participation  over  this  period.  The  simulation 
analysis — as  well  as  efficiency  considerations — led  to  us  to  choose  Scheme  A.  It  also  showed  that 
oversampling  of  poor  tracts  was  necessary  to  obtain  a  sufficient  number  of  welfare  recipients  and 
minorities,  particularly  blacks  (who  are  concentrated  in  poorer  neighborhoods). 

We  considered  further  stratifying  the  sample  of  neighborhoods  by  race  and  ethnicity.  The 
main  reason  for  doing  this  would  be  to  obtain  a  sufficient  sample  of  the  smaller  race/ethnic  groups  in 
Los  Angeles  County,  notably  blacks  (who  comprise  10  percent  of  the  population)  and  Asians  (who 
make  up  13  percent  of  the  population).  Although  Asians  are  a  larger  portion  of  the  population  than 
blacks,  they  are  a  heterogeneous  group.  Major  Asian  groups  in  Los  Angeles  County  include 
Koreans  and  Chinese,  although  there  are  also  sizable  populations  of  a  wide  variety  of  groups, 
including  Cambodians  and  Armenians. 

One  option  was  to  sample  blocks  according  to  some  function  of  their  race/ethnic 
composition.  However,  stratifying  tracts  by  race  and  ethnicity  as  well  as  by  income  was  very 
complex.  There  are  four  major  ethnic  groups  in  Los  Angeles  County — Hispanics,  whites,  blacks 
and  Asians.  But  there  is  no  easy  way  to  categorize  tracts  according  to  their  ethnic  composition  into 
a  small  number  of  groups  because  most  tracts  have  at  least  two  groups — and  many  times  all  four 
groups — but  in  widely  varying  proportions.  Thus  there  was  too  wide  a  range  of  neighborhood 
types  to  justify  oversampling  some  subset  of  them.  Another  option  was  to  oversample 
respondents  of  certain  ethnicities  within  tracts  selected  independently  of  ethnic  composition.  With 
our  oversample  of  very  poor  and  poor  neighborhoods  we  expected  to  effectively  obtain  an 
oversample  of  blacks.  We  decided  against  oversampling  Asians  because  of  cost  and  logistical 
considerations  (e.g.,  translation  and  programming  costs  and  difficulty  finding  bilingual  interviewers). 


Table  3.  Stratification  of  Tracts  in  Los  Angeles  County 


Stratum 

Percentile 
rank  by 
percent  in 
poverty 

Weighted 
average 
percent  in 
poverty 

All  tracts 

Sampled  tracts 

Number 
of  tracts 

Estimated 

population 

Percent 
of  total 

Number 
of  tracts 

Estimated 

population 

Percent 
of  total 

Very  poor 

47% 

161 

881,956 

9% 

20 

134,407 

27% 

Poor 

30% 

490 

3,302,831 

34% 

20 

179,210 

37% 

Non  poor 

1  -59th 

10% 

973 

5,409,384 

56% 

25 

177,145 

36% 

Total 

- 

- 

1,624 

9,594,171 

100% 

65 

490,762 

100% 

Note:  Population  and  poverty  estimates  are  for  1997  and  are  based  on  data  from  the  Los  Angeles  County  Urban 
Research  Division. 


Table  3  provides  descriptive  statistics  for  the  final  stratification  scheme.  Tables  4  and  5 
provided  weighted  summary  tract  characteristics  for  all  tracts  in  Los  Angeles  County  (Table  4)  and 
the  65  tracts  in  the  L. A. FANS  sample  (Table  5).  Information  is  provided  on  percent  in  poverty, 
median  household  income,  percent  of  persons  receiving  welfare,  percent  of  households  with  children 
under  age  18,  and  race  and  ethnic  composition.  With  the  exception  of  the  percent  of  households 


11 


r 


with  a  child  under  18,  which  is  from  the  1990  Census,  all  the  data  are  from  the  Los  Angeles  County 
Urban  Research  Division’s  estimates  for  1995  (median  household  income)  and  1997  (all  other 
variables). 


Table  4.  Weighted  Average  Tract  Characteristics  by  Strata  for  All  of  Los  Angeles  County 


Stratum 

Number 
of  tracts 

Population 
in  poverty 

Median 

HH 

income 

Welfare 

population 

HHs 
with 
children 
under  18 

White 

Black 

Hispanic 

Asian 

Very  poor 

161 

47% 

mm 

46% 

5% 

20% 

68% 

6% 

Poor 

490 

30% 

40% 

15% 

11% 

62% 

11% 

Non  poor 

973 

10% 

30% 

49% 

6% 

29% 

15% 

Total 

1,624 

20% 

$26,500 

8% 

34% 

33% 

9% 

44% 

13% 

Note:  All  figures  are  1997  estimates  except  for  median  HH  income  which  corresponds  to  1995  and  HHs  with  child 
under  18,  which  is  from  the  1990  Census. 


A  comparison  of  the  figures  in  Tables  4  and  5  reveals  that  characteristics  of  the  sampled 
tracts  generally  corresponded  well  to  the  tracts  that  they  represent,  within  each  strata.  There  are 
differences,  due  to  sampling  variability,  in  the  percent  of  households  with  children  under  18  and  in 
the  race/ethnic  composition  of  the  tracts.  Sampled  tracts  in  the  very  poor  stratum  have  fewer 
whites  and  blacks  than  expected,  but  more  Hispanics.  In  the  poor  stratum  there  are  fewer  blacks 
and  more  Hispanics.  The  totals  obviously  differ  because  of  the  disproportionate  number  of  very 
poor  and  poor  tracts  among  the  65  sampled  tracts. 


Table  5.  Weighted  Tract  Characteristics  by  Strata  for  65  Tracts  in  Sample 


Stratum 

Number 
of  tracts 

Population 
in  poverty 

Median 

HH 

income 

Welfare 

population 

HHs 
with 
children 
under  1 8 

White 

Black 

Hispanic 

Asian 

Very  poor 

20 

47% 

$12,600 

19% 

53% 

3% 

15% 

75% 

7% 

Poor 

20 

30% 

$17,300 

12% 

43% 

15% 

5% 

71% 

9% 

Non  poor 

25 

10% 

$32,700 

4% 

34% 

52% 

4% 

28% 

14% 

Total 

65 

28% 

$22,800 

11% 

42% 

25% 

7% 

57% 

10% 

Note:  All  figures  are  1997  estimates  except  for  median  HH  income  which  corresponds  to  1995  and  HHs  with  child 
under  18,  which  is  from  the  1990  Census. 


Sampling  Tracts  Within  Strata 

The  remainder  of  the  L.A.FANS  sampling  plan  involves  selecting  tracts  within  strata,  blocks 
within  tracts,  households  within  blocks,  and,  finally,  respondents  within  households.  One 
consideration  guiding  the  selection  of  tracts  within  strata  is  the  desire  to  obtain  equal  samples  of 
households  and  respondents  from  each  tract,  even  though  tracts  vary  greatly  in  size.  There  are  two 
main  reasons  for  this.  First,  tract-level  means  will  be  used  in  a  variety  of  L.A.FANS  analyses; 
having  equal  samples  across  clusters  will  strengthen  these  analyses  because  it  will  lead  to  tract-level 
estimates  with  roughly  equal  variances.  Second,  for  the  analysis  of  the  relationships  among 
members  of  the  same  cluster,  equal  numbers  may  also  be  preferable. 


12 


We  have  good  estimates  of  the  population  sizes  of  Los  Angeles  County  tracts  in  1997, 
which  allow  us  to  sample  census  tracts  within  each  stratum  with  probability  proportional  to 
population  size.  By  then  selecting  an  equal  number  of  households  (50)  from  each  tract,  we  have  a 
multistage  design  with  a  minimum  design  effect.  Ignoring  the  intermediate  step  of  selecting  blocks 
within  tracts,  the  probability  of  selecting  the  /th  household  in  the  yth  tract  in  the  Ath  stratum  is  given 
by: 

Pr  (HHljk  )  =  nk  /  Nkx  pjk/  Pk  x  50/  pjk 
=  nk/Nkx50/Pk, 

where  nk  is  the  number  of  tracts  to  be  selected  in  the  Ath  stratum,  Nk  is  the  total  number  of  tracts  in 
the  Ath  stratum,  pjk  is  the  population  of  the  y'th  tract  in  the  Ath  stratum,  and  P k  is  the  total 
population  in  the  Ath  stratum.  Because  this  expression  does  not  depend  on  any  characteristics  of 
the  tract,  the  sampling  probabilities — and  hence  the  sampling  weights — are  the  same  for  all 
households  in  the  stratum,  resulting  in  a  self-weighted  sample.  As  a  consequence  the  weights  are 
easily  computed  and  the  design  is  efficient  because  of  less  variability  between  the  sampling  weights. 
The  probability  proportional  to  size  design  was  implemented  using  a  systematic  sampling 
algorithm. 

Sample  of  Blocks  within  Selected  Tracts 

In  the  second  sampling  stage,  we  selected  census  blocks  within  each  sampled  census  tract. 
We  then  sampled  households  from  these  blocks  (rather  than  from  the  tract  as  a  whole)  in  order  to 
simplify  the  fieldwork  associated  with  listing  addresses,  interviewing  households,  and  monitoring 
operations — and  hence  reduce  the  costs.  However,  a  sufficient  number  of  blocks  was  selected  in 
order  to  retain  the  diversity  within  each  tract. 

We  determined  the  number  of  blocks  to  be  sampled  in  each  tract  by  dividing  the  target 
number  of  listings  per  tract  (615)  by  the  tract’s  weighted  average  block  size.  Because  the  weighted 
average  block  size  varies  by  tract,  so  too  does  the  number  of  sampled  blocks.  In  total,  439  blocks 
were  selected  (of  which  1 1  turned  out  to  have  no  households),  for  an  average  of  6.6  non-zero  blocks 
per  tract  and  with  a  range  between  2  and  14.  The  target  number  of  listings  per  tract  of  615  is  equal 
to  our  total  target  number  of  listings  for  the  entire  sample  (40,000)  divided  by  the  number  of  tracts 
in  the  sample  (65).  The  total  number  of  listings  was  based  on  estimated  rates  of  occupancy,  non¬ 
response,  households  with  children,  and  limited  English/Spanish  proficiency.  We  chose  to  list  the 
same  number  of  dwelling  units  per  tract  because  we  planned  to  interview  the  same  number  of 
households  per  tract.  Note  that  we  also  plan  to  interview  the  same  number  of  households  per  block, 
within  each  tract.  This  will  serve  to  minimize  the  harmonic  mean  and,  therefore,  both  the  block-  and 
tract-level  cluster  effect. 

In  order  to  obtain  updated  estimates  of  block  population  for  our  sampled  tracts,  we  used 
block-level  data  from  the  1990  Census  and  tract- level  population  estimates  for  1997  from  the  Los 
Angeles  County  Urban  Research  Division.  We  assumed  that  the  change  in  block  population 
between  1990  and  1997  was  uniform  for  all  blocks  within  the  same  tract.  This  procedure  worked 
well  because  the  1997  estimates  were  of  very  high  quality  and  the  assumption  of  uniform  growth 
across  blocks  in  the  same  tract  was  generally  reasonable.  However,  there  were  certain  tracts  and 
blocks  where  this  assumption  did  not  hold.  The  main  issue  we  needed  to  guard  against  was  blocks 


13 


I 


that  had  a  substantially  larger  population  size  than  we  estimated.  This  occurred  often  when  there 
was  major  new  construction.  Therefore  as  a  first  step  in  the  listing  operation  we  obtained  a  rough 
count  of  dwelling  units  in  each  block.  This  allowed  us  to  identify  blocks  for  which  our  estimates 
represented  a  substantial  undercount.  For  these  cases,  we  split  the  blocks  into  smaller  units  based 
on  visits  to  the  area  and  careful  counting  and  mapping.  We  subsampled  these  smaller  units  to  yield 
a  count  of  listings  close  to  the  desired  size. 

We  sampled  census  blocks  with  probability  within  probability  proportional  to  block 
population  size.  First,  however,  we  calculated  the  distribution  of  block  size  for  Los  Angeles 
County  and  set  the  block  sizes  in  the  lowest  five  percentiles  to  the  fifth  percentile  in  order  to  put  a 
floor  on  the  sampling  probabilities.  This  results  in  a  ceiling  on  the  weights  and  protects  against  an 
unusual  block  having  a  very  large  weight,  although  it  does  result  in  some  loss  of  efficiency.  In 
addition,  very  large  blocks  were  selected  with  certainty  to  avoid  being  sampled  more  than  once  in 
the  systematic  sampling  algorithm  that  we  employed. 

Sample  of  Households 

The  goal  of  the  third  sampling  stage  was  to  select  50  households  within  each  tract.  This 
number  of  households  per  tract  was  set  by  our  decision  to  sample  65  tracts  and  our  desire  to  have  a 
balanced  design,  in  which  the  same  number  of  households  per  tract  is  interviewed.  A  balanced 
design  is  required  for  certain  modeling  approaches  and  minimizes  the  cluster  effect  and  the  variance. 
The  50  households  were  allocated  evenly  across  the  sampled  blocks  in  each  tract.  Thus  within  each 
tract,  we  sampled  the  same  number  of  households  per  block  (although  households  per  block  varied 
across  tracts). 

The  L.A.FANS  design  calls  for  interviewing  adults  and  children  living  in  standard  housing 
units,  based  on  definitions  typical  of  those  used  in  other  household  surveys.  Dwelling  units  that 
were  eligible  for  selection  include  houses,  apartments,  mobile  homes  and  converted  garages. 
Excluded  dwelling  units  are  institutions  (such  as  prisons,  barracks,  ships,  detention  centers,  and 
nursing  homes);  non-institutional  group  quarters  (such  as  shelters,  halfway  houses,  group  homes  for 
the  disabled);  hotels  and  motels,  including  SROs;  and  temporary  arrangements  such  as  tents,  cars  or 
vans,  recreational  vehicles,  trailers,  and  boats.  Note  that  homeless  people  are  excluded  from  the 
baseline  sample  (though  we  ask  retrospective  questions  about  homelessness  and  will  follow  people 
prospectively  to  collect  information  about  homelessness. 

The  50  households  were  selected  at  random  from  a  listing  of  all  dwelling  units  within  the 
sampled  blocks.  In  total  we  listed  about  41,000  addresses,  slightly  more  than  the  target  of  40,000. 
Households  with  children  under  1 8  years  of  age  will  be  oversampled  so  that  they  make  up  70 
percent  of  the  sample,  compared  to  an  average  of  35  percent  they  would  otherwise  comprise. 
Households  that  are  unable  to  complete  the  interviews  in  one  of  the  two  survey  languages — English 
and  Spanish — will  be  excluded  from  the  sample.  Our  original  intention  was  to  carry  out  interviews 
in  three  languages.  We  explored  in  detail  which  languages  are  spoken  in  the  sampled  census  tracts 
using  up-to-date  home  language  information  from  the  Los  Angeles  Unified  School  District  and  the 
Los  Angeles  County  Office  of  Education  for  schools  in  or  near  the  census  tracts  in  our  sample.  Our 
analysis  of  these  school  data  showed  that  many  languages  are  spoken  in  these  census  tracts,  but  that 
each  language  is  spoken  by  less  than  5  percent  of  the  population.  The  two  most  common  languages 
(aside  from  English  and  Spanish)  are  Armenian  and  Cambodian/Khmer,  each  with  less  than  1-2 


14 


* 


percent  of  the  total  population  in  the  L.A.FANS  tracts.  Because  so  few  people  speak  any  third 
language  and  given  the  substantial  costs  of  high  quality  translation,  CAPI  programming,  and 
interviewing  in  a  third  language,  L.A.FANS  interviewing  is  restricted  to  English  and  Spanish. 
However,  basic  information  on  these  excluded  households  will  be  collected. 

Household  Survey  Respondents 

In  the  final  sampling  stage,  one  adult  respondent  will  be  sampled  at  random  in  each  selected 
household  (designated  the  RSA  or  randomly  sampled  adult).  In  households  with  children,  one  child 
respondent  will  also  be  selected  at  random  and  designated  the  RSC  (randomly  sampled  child). 

These  two  respondents  will  be  followed  throughout  the  longitudinal  survey.  The  reason  for 
selecting  two  primary  respondents  in  each  household,  one  adult  and  one  child,  is  that  it  makes  the 
rules  straightforward  for  tracking  respondents  in  subsequent  waves,  especially  if  the  original 
household  splits.  Other  adults  and  children  will  be  selected  for  the  sample  based  on  their 
relationship  with  the  two  primary  respondents  at  the  time  of  each  interview. 

Residents  eligible  for  selection  as  primary  or  secondary  respondents  are  those  who  live  in 
the  sampled  household  for  halftime  or  more.  People  who  were  temporarily  away  from  the 
household  are  eligible  for  selection;  this  includes  patients  in  hospitals,  vacationers,  and  business 
travelers.  Part-time  residents  who  spend  less  than  half  time  in  the  household  and  overnight  visitors 
in  the  past  two  weeks  are  not  eligible  for  selection,  although  they  are  listed  in  the  roster  and  basic 
demographic  information  on  them  is  collected.6 

Table  6  summarizes  which  respondents  will  be  interviewed  in  households  with  and  without 
children.  In  households  with  children,  the  mother  of  the  randomly  selected  child  will  be  selected  as  a 
respondent  and  termed  the  Primary  Caregiver  (PCG).  If  the  RSC’s  mother  does  not  live  in  the 
household  or  is  unable  to  answer  questions  about  the  child,  the  child’s  actual  primary  caregiver  will 
be  selected  as  the  respondent  to  provide  information  on  the  selected  child.  Note  that  the  selection 
of  the  PCG  does  not  depend  on  his  or  her  age — if  necessary,  we  will  select  and  interview  a  mother 
or  other  primary  giver  who  is  under  1 8  years  of  age. 


Table  6.  L.A.FANS  Respondents  by  Household  Type 


HHs  with  Children  <18 

HHs  without  Children  <18 

Randomly  selected  adult  (RSA) 
Randomly  selected  child  (RSC) 

Primary  Caregiver  of  RSC  (PCG) 
Sibling  of  RSC  (SIB) 

Randomly  selected  adult  (RSA) 

If  the  RSC  has  one  or  more  siblings  under  1 8  years  of  age  who  share  the  same  biological  or 
adoptive  mother  and  the  same  PCG,  we  will  randomly  select  one  of  them  for  interview  (and 
designate  this  child  as  the  SIB).  Based  on  data  for  Los  Angeles  County  from  the  1996-97  Current 
Population  Survey  (CPS),  we  expect  that  an  eligible  sibling  will  be  present  in  approximately  60 
percent  of  households  with  children.  Under  these  selection  rules,  in  some  households  with  two 
adults  and  two  children  all  four  people  will  be  interviewed.  Also,  in  many  cases  the  RSA  and  the 

6  Note  that  people  who  do  not  spend  at  least  half  time  in  any  single  location — e.g.,  they  divide  their  time 
evenly  across  three  or  more  different  locations — are  not  eligible  for  selection. 


15 


PCG  will  be  the  same  person,  since  the  RSA  is  chosen  at  random  from  among  all  the  adults  in  the 
household.  Based  on  the  1996-97  CPS  data  for  Los  Angeles  County,  we  estimate  that 
approximately  half  the  time  the  RSA  will  be  the  RSC’s  primary  caregiver  and  almost  30  percent  of 
the  time  the  RSA  will  be  the  PCG’s  spouse.  Only  in  about  20  percent  of  cases  will  the  RSA  be 
another  adult  in  the  HH.  Finally,  we  have  made  provisions  for  households  in  which  there  are  only 
children  less  than  18  years  of  age  and  those  in  which  no  adult  functions  as  a  PCG  for  a  selected  child 
(e.g.,  a  household  of  roommates,  some  of  which  are  under  18).  The  same  rules  for  selecting  a  child  at 
random  will  apply,  but  a  sampled  child  without  a  primary  caregiver  in  the  household  will  be  deemed 
an  emancipated  minor.  They  will  be  treated  the  same  as  a  randomly  selected  adult  respondent  in 
terms  of  the  questionnaires  they  receive  and  how  they  will  be  followed  in  subsequent  waves. 

In  Table  7  we  indicate  which  questionnaire  modules  are  completed  by  each  respondent  and 
in  Table  8  we  briefly  summarize  the  content  of  each  questionnaire  module.  Any  adult  resident  of  a 
sampled  household  can  complete  the  roster.  This  module  must  be  completed  first  because  it  is  used 
to  sample  respondents  in  the  household.  The  roster  provides  a  list  of  all  household  members,  their 
relationships  with  each  other,  and  basic  demographic  information. 


Table  7.  Questionnaire  Modules  to  Be  Completed  by  Each  Respondent 


Questionnaire 

Respondent 

Any  Adult 

RSA 

PCG 

RSC 

SIB 

Roster 

X 

Household 

X 

Adult 

X 

X 

Primary  Caregiver 

X 

Parent-Child 

X 

Child 

X 

X 

Assessments 

X 

X 

X 

The  RSA  completes  the  household  questionnaire,  which  asks  about  family  income  and 
assets.  A  second  household  questionnaire  module  is  completed  if  the  RSA  and  PCG  belong  to 
different  nuclear  family  units  within  the  same  household.  The  RSA  and  PCG  both  complete  an 
Adult  questionnaire,  which  asks  for  the  respondent’s  own  demographic,  social,  and  economic 
characteristics  and  basic  information  on  non-interviewed  spouses.  On  average,  we  expect  to  have 
1.5  adults  completing  the  RSA  questionnaire  in  each  household.  The  PCG  completes  several 
additional  modules  including  the  Primary  Caregiver  module,  which  collects  behavioral  and 
psychological  information  on  the  respondent  through  a  largely  self-administered  questionnaire,  the 
Parent-Child  module,  which  collects  detailed  information  on  the  family  background  and  the  lives  of 
the  RSC  and  the  SIB;  and  the  Passage  Comprehension  test  from  the  Woodcock-Johnson  Psycho- 
Educational  Battery  (Revised).  The  RSC  and  SIB  each  complete  a  self-administered  Child 
questionnaire  if  they  are  9  years  of  age  or  older,  with  a  more  extensive  set  of  questions  asked  for 
children  aged  12  and  older.  Children  aged  3  to  5  years  will  complete  the  Letter- Word  Identification 
and  Applied  Problems  tests  from  the  Woodcock-Johnson  battery  and  children  6-17  year  olds  will 
complete  these  two  tests  plus  the  Passage  Comprehension  test.  All  questionnaire  modules  and 
testing  materials  are  available  in  English  and  Spanish. 


16 


Table  8.  Summary  of  Questionnaire  Content 


% 


% 


Questionnaire 

Content 

Roster 

Household 

Adult 

Primary  Caregiver 

Parent-Child 

Child 

Assessments 

List  of  HH  members,  relationships  among  HH  members,  basic  demographic  characteristics 
Family  income  and  assets 

Demographic,  social,  economic  characteristics;  basic  info  on  non-interviewed  spouses 
Behavioral  and  psychological  information  (e.g.,  depression  and  self-esteem) 

PCG’s  reports  on  the  RSC  and  SIB 

SAQ  info  about  child 

Reading  and  mathematics  reasoning 

Follow-Up  Waves 

To  help  understand  the  causal  effects  of  neighborhoods  on  children  and  families  it  is 
necessary  to  collect  longitudinal  data  at  all  three  of  these  levels.  This  will  allow  researchers  to 
capture  changes  in  family  composition  or  neighborhood  characteristics  over  time  and  model  the 
process  of  neighborhood  choice.  Families  choose  the  neighborhood  in  which  to  raise  their  children 
through  the  process  of  residential  mobility.  Moreover,  high  levels  of  residential  mobility,  in  Los 
Angeles  County  and  around  the  country,  can  change  the  characteristics  of  neighborhoods  in  a 
relatively  short  period  of  time.  Data  from  the  decennial  census  are  not  well  suited  to  picking  up 
these  rapid  changes. 

Figure  2  illustrates  the  main  ideas  guiding  the  design  of  the  study  in  subsequent  waves.  The 
design  essentially  combines  two  studies  in  one:  a  panel  study  of  the  main  respondents  and  a 
repeated  cross-sectional  study  of  residents  in  each  sampled  community.  In  each  wave,  we  will 
interview  sampled  respondents  who  remain  in  the  neighborhood  as  well  as  those  who  have  left.  We 
will  also  select  a  sample  of  “new  entrants”  into  the  neighborhood,  that  is  people  who  have  moved 
into  the  neighborhood  between  the  preceding  wave  and  the  current  wave.  Thus,  at  each  wave  we 
will  have  a  representative  sample  of  all  neighborhood  residents.  The  new  entrants  become  part  of 
the  sample  and  will  be  followed  in  subsequent  waves. 

Figure  2.  Design  of  Follow-Up  Waves 


The  RSCs  and  RSAs  are  considered  our  primary  respondents  and  once  they  join  the  sample 
they  will  be  followed  throughout  the  study,  whether  they  live  together  or  apart.  We  plan  to 
interview  these  sampled  individuals  wherever  they  move.  For  this  reason,  detailed  contact 


17 


t 


information  is  collected  for  each  respondent  at  Wave  1  and  we  will  remain  in  contact  with 
respondents  between  waves.  Respondents  who  remain  in  Los  Angeles  County  will  be  interviewed 
in  person  in  subsequent  waves,  regardless  of  the  neighborhood  in  which  they  live.  Those  who  move 
out  of  Los  Angeles  County  will  be  interviewed  by  telephone,  even  if  they  leave  the  country.  Other 
adults  and  children  will  be  interviewed  depending  on  whether  or  not  they  live  with  the  RSCs.7  We 
expect  most  moves  to  be  within  Los  Angeles  County  and  to  other  locations  in  California,  although 
there  will  be  some  moves  to  elsewhere  in  the  U.S.  and  overseas.  Table  9  shows  migration  rates  from 
the  1990  Census  covering  all  moves  between  1985  and  1990,  except  moves  overseas.  Eighty-four 
percent  of  residents  in  Los  Angeles  County  in  1985  remained  in  the  county  in  1990,  although  43 
percent  of  these  people  had  changed  residence  at  least  once.  Based  on  Los  Angeles  County 
respondents  from  the  1996-97  CPS,  annual  housing  turn-over  rates  may  be  as  high  as  25  percent. 


Table  9.  Migration  1985-1990  for  1985  Residents  of  Los  Angeles  County 


Number 

Percent 

No  move 

3,847,330 

48% 

Move  Elsewhere  in  LA 

2,906,640 

36% 

Move  Elsewhere  in  CA 

808,456 

10% 

Move  Elsewhere  in  US 

468,607 

6% 

Move  Overseas 

N/A 

N/A 

Total 

8,031,033 

100% 

Note:  Data  are  from  the  1990  Census. 


In  follow-up  waves,  new  entrants  to  sampled  neighborhoods  will  be  selected  using  several 
different  methods.  Once  they  enter  the  sample,  all  new  entrants  will  remain  in  the  study  and  will  be 
tracked  in  subsequent  waves  according  to  the  same  rules  used  to  follow  respondents  from  the 
baseline.  In  Wave  2,  to  be  fielded  two  years  after  the  baseline,  new  household  members  in  study 
households  that  remain  in  the  65  neighborhoods  and  new  entrant  households  will  be  identified  and 
sampled.  The  general  principle  guiding  the  selection  of  new  entrants  is  that  everyone  moving  into  or 
being  bom  into  the  tract  will  have  a  positive  probability  of  being  selected  as  an  RSA  or  RSC. 

We  will  identify  new  entrant  households  using  a  vacancy  replacement  method,  in  which 
eligible  households  from  the  previous  wave  are  screened  for  a  complete  turnover  of  residents. 
Eligible  households  include  those  with  residents  selected  and  interviewed  in  the  previous  wave  but 
who  subsequently  moved;  households  that  were  split  into  two  or  more  units;  previously  vacant 
units;  and  households  that  refused  to  participate  or  were  ineligible.  Note  that  it  does  not  include 
households  that  were  screened  out  of  the  sample  because  they  had  no  children.  Eligible  households 
will  be  sampled  using  a  70-30  split  that  favors  households  with  children. 

In  Wave  3  we  will  relist  all  dwelling  units  in  the  65  sampled  neighborhoods  in  order  to 
identify  new  housing  units.  A  sample  of  new  entrants  to  the  neighborhood,  living  in  both  new  and 
existing  housing,  will  be  selected  for  the  sample.  Wave  3  will  also  include  a  sample  of  new  members 


7  Specifically,  the  RSC’s  Primary  Caregiver  will  be  interviewed  even  if  this  is  a  different  person  than  in 
previous  waves.  SIB  respondents  will  be  interviewed  only  if  he/she  still  lives  with  the  RSC.  If  he/she  no  longer  lives 
with  the  RSC,  we  will  collect  limited  information  on  the  SIB. 


18 


in  existing  study  households.  Table  10  provides  estimates  of  the  total  number  of  respondents  at 
each  wave. 


Table  10.  L.A.FANS  Estimated  Total  Sample  Sizes 


Wave  1 

Wave  2 

Wave  3 

Households 

3,250 

3,401 

3,788 

RSAs 

3,250 

3.439 

3,850 

RSCs  +  Siblings 

3,624 

3,881 

4,394 

PCGs 

2,275 

2,345 

2,594 

4.  Conclusion 

The  Los  Angeles  Family  and  Neighborhood  Survey  is  an  important  new  resource  for 
policymakers  with  an  interest  in  Los  Angeles  and  researchers  in  the  social  and  behavioral  sciences. 

It  will  be  especially  useful  for  researchers  who  seek  to  understand  the  effects  of  neighborhoods  on 
child  development,  migration  and  residential  mobility,  and  welfare  reform.  However,  it  will  also 
serve  as  a  more  general  resource  for  researchers  interested  in  a  variety  of  other  topics.  It  will 
complement  data  from  other  planned  or  on-going  local  area  studies,  such  as  the  Project  on  Human 
Development  in  Chicago  Neighborhoods,  the  Study  on  Welfare  Reform  and  Children  in  Three  Cities 
(Boston,  Chicago,  and  San  Antonio),  the  Panel  Study  of  Income  Dynamics  Child  Development 
Supplement,  the  National  Longitudinal  Survey  of  Youth  Child  Supplement,  the  new  1997  National 
Longitudinal  Survey  of  Youth,  and  the  Fragile  Families  and  Child  Well-Being  Project  (see  Brooks- 
Gunn  et  al.,  2000).  Together  with  these  studies,  L.A.FANS  will  help  to  provide  a  new  nationwide 
understanding  of  a  number  of  important  policy  and  research  issues. 


19 


I 


# 


References 

Billy,  John  O.G.,  and  David  E.  Moore.  1992.  “A  multilevel  analysis  of  marital  and  nonmarital 
fertility  in  the  United  States,”  Social  Forces  70:  977-1011. 

Brooks-Gunn,  Jeanne,  Lisa  J.  Berlin,  Tama  Leventhal,  and  Allison  Sidle  Fuligni.  2000.  “Depending 
on  the  kindness  of  strangers:  Current  national  data  initiatives  and  developmental  research,  Child 
Development  71:  257-268. 

Cohen,  Michael  P.  1998.  “Determining  sample  sizes  for  surveys  with  data  analyzed  by  hierarchical 
linear  models,”  Journal  of  Official  Statistics  14:  267-275. 

Coleman,  J.S.  1988.  “Social  capital  in  the  creation  of  human  capital,”  American  Journal  of  Sociology 
94:  S95-S120. 

Crane,  J.S.  1991.  “The  epidemic  theory  of  ghettos  and  neighborhood  effects  on  dropping  out  and 
teenage  childbearing,”  American  Journal  of  Sociology  96:  1226-59. 

Duncan,  G.J.,  and  S.W.  Raudenbush.  1998.  “Neighborhoods  and  adolescent  development:  How  can 
we  determine  the  links,”  Joint  Center  for  Poverty  Research  Working  Paper  Num.  3. 

Duncan,  G.J.,  J.  Brooks-Gunn,  and  P.  Klebanov.  1994.  “Economic  deprivation  and  early  childhood 
development,”  Child  Development  65:  296-3 1 8. 

Efron,  Bradley,  and  Robert  J.  Tibshirani.  1993.  An  introduction  to  the  bootstrap.  New  York: 
Chapman  and  Hall. 

Entwisle,  B.,  and  W.M.  Mason.  1985.  “Multilevel  effects  of  socioeconomic  development  and 
family  planning  programs  on  children  ever  bom,”  American  Journal  of  Sociology  91 :  616-649. 

Enwisle,  B.,  W.M.  Mason,  and  A.I.  Hermalin.  1986.  “The  multilevel  dependence  of  contraceptive 
use  on  socioeconomic  development  and  family  planning  program  strength,”  Demography  23: 
199-216. 

Furstenberg,  Frank  F.,  and  Mary  E.  Hughes.  1997.  “The  influence  of  neighborhoods  on  children’s 
development:  A  theoretical  perspective  and  a  research  agenda,”  in  Jeanne  Brooks-Gunn,  Greg  J. 
Duncan,  and  J.  Lawrence  Aber  (eds.),  Neighborhood  poverty:  Policy  implications  in  studying 
neighborhoods ,  Volume  II.  New  York:  Russell  Sage. 

Gamer,  C.,  and  S.  Raudenbush.  1991.  “Neighborhood  effects  on  educational  attainment:  A 
multilevel  analysis,”  Sociology  of  Education  64:  251-262. 

Gephard,  Martha  A.  1997.  “Neighborhoods  and  communities  as  contexts  for  development,”  in 
Jeanne  Brooks-Gunn,  Greg  J.  Duncan,  and  J.  Lawrence  Aber  (eds.),  Neighborhood  poverty: 
Policy  implications  in  studying  neighborhoods,  Volume  I.  New  York:  Russell  Sage. 

Jencks,  C.,  and  S.  Mayer.  1990.  “The  social  consequences  of  growing  up  in  a  poor  neighborhood,” 
in  L.E.  Lynn  and  M.F.H.  McGeary  (eds.),  Inner-city  poverty  in  the  United  States.  Washington: 
National  Academy  Press. 

Kish,  L.  1965.  Survey  Sampling  New  York:  John  Wiley  &  Sons,  Inc. 


20 


> 


Lee,  B.A.,  R.S.  Oropesa,  and  J.W.  Kanan.  1994.  “Neighborhood  context  and  residential  mobility,” 
Demography  3 1 :  249-270. 

Liang,  K.Y.,  and  S.L.  Zeger.  1986.  “Longitudinal  data  analysis  using  generalized  linear  models,” 
Biometrika  73:  13-22. 

Mok,  M.  1995.  “Sample  size  requirements  for  2-level  designs  in  educational  research,”  Multilevel 
Modeling  Newsletter  7(2):  11-15. 

O’Campo,  P.,  A.C.  Gielen,  R.R.  Faden,  X.  Xue,  N.  Kass,  and  M.  Wang.  1995.  “Violence  by  male 
partners  against  women  during  the  childbearing  year:  A  contextual  analysis,”  American  Journal 
of  Public  Health  85:  1092-1097. 

Pebley,  A.R.,  N.  Goldman,  and  G.  Rodriguez.  1996.  “Prenatal  and  delivery  care  and  childhood 
immunization  in  Guatemala:  Do  family  and  community  matter?”  Demography  33:  197-210. 

Sampson,  R.J.,  and  W.B.  Groves.  1989.  “Community  structure  and  crime:  Testing  social 
disorganization  theory,”  American  Journal  of  Sociology  94:  774-802. 

Sampson,  Robert  J.,  and  Jeffrey  D.  Moreno ff.  1997.  “Ecological  perspectives  on  neighborhood 
context  of  urban  poverty:  Past  and  present,”  in  Jeanne  Brooks-Gunn,  Greg  J.  Duncan,  and  J. 
Lawrence  Aber  (eds.),  Neighborhood  poverty:  Policy  implications  in  studying  neighborhoods. 
Volume  II.  New  York:  Russell  Sage. 

Sampson,  Robert  J.,  Stephen  W.  Raudenbush,  and  Felton  Earls.  1997.  “Neighborhoods  and  violent 
crime:  A  multilevel  study  of  collective  efficacy,”  Science  277:  918-924. 

Sastry,  Narayan.  1 996.  “Community  characteristics,  individual  and  household  attributes,  and  child 
survival  in  Brazil,”  Demography  33:  211-229. 

Snijders,  T.A.B.,  and  R.J.  Bosker.  1993.  “Standard  errors  and  sample  sizes  for  two-level  research,” 
Journal  of  Educational  Statistics  18:  237-259. 

Wilson,  W.J.  1987.  The  truly  disadvantaged:  The  inner  city,  the  underclass,  and  public  policy. 
Chicago:  Chicago  University  Press. 

Wilson,  W.J.  1996.  When  work  disappears:  The  world  of  the  new  urban  poor.  New  York:  Knopf. 

Wong,  G.Y.,  and  W.M.  Mason.  1985.  “The  hierarchical  logistic  regression  model  for  multilevel 
analysis,”  Journal  of  the  American  Statistical  Association  80:  513-524. 

Wong,  G.Y.,  and  W.M.  Mason.  1991.  “Contextually  specific  effects  and  other  generalizations  of  the 
hierarchical  linear  model  for  comparative  analysis,”  Journal  of  the  American  Statistical 
Association  86:  487-503. 


21 


4 


t 

Appendix  A.  Tracts  Ineligible  for  L.A.FANS  Sample 


A  total  of  1 1  tracts  representing  ships-at-sea  were  dropped  from  sample.  These  tracts  are 
identified  by  having  codes  ending  with  “99”  or  “98”. 

Table  A.l.  Ships-at-sea  tracts  dropped  from  sample 


Tract 

Tract 

1 

294999 

7 

575699 

2 

295199 

8 

575799 

3 

296199 

9 

577699 

4 

296299 

10 

620099 

5 

297199 

11 

702999 

6 

555198 

A  total  of  13  tracts  were  dropped  because  they  had  a  high  percentage  (over  80  percent)  of 
people  living  in  group  quarters.  Note  that  few  of  these  tracts  had  any  children  and  many  had  no 
children. 


Table  A.2.  Tracts  with  high  proportion  in  group  quarters  dropped  from  sample 


Tract 

Tract 

1 

207400 

8 

551600 

2 

222700 

9 

574601 

o 

265301 

10 

574700 

4 

296100 

11 

575600 

5 

401901 

12 

701100 

6 

402404 

13 

920200 

7 

403200 

The  following  4  tracts  were  dropped  from  the  list  of  tracts  eligible  for  the  sample  because 
one  or  more  key  variables  was  missing. 

Table  A.3.  Tracts  with  missing  variables  dropped  from  sample 


Tract 

Tract 

1 

320000 

3 

573500 

2 

555101 

4 

573901 

A  total  of  28  tracts  of  the  1,652  tracts  in  Los  Angeles  County  were  dropped  from  the 
sample.  Thus,  there  were  1,624  tracts  eligible  for  selection  for  the  L.A.FANS  sample. 


22 


